Search CORE

34 research outputs found

Approximately Minwise Independence with Twisted Tabulation

Author: A. Broder
A.Z. Broder
E. Cohen
M. Datar
M. Pǎtraşcu
R.E. Fan
Y. Bachrach
Publication venue
Publication date: 01/01/2014
Field of study

A random hash function

h

\varepsilon

-minwise if for any set

S

|S|=n

, and element

x\in S

\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n

. Minwise hash functions with low bias

\varepsilon

have widespread applications within similarity estimation. Hashing from a universe

[u]

, the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [SODA'13] makes

c=O(1)

lookups in tables of size

u^{1/c}

. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields

\tilde O(1/u^{1/c})

-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79]

\tilde O(1/u^{1/c})

-minwise hashing requires

\Omega(\log u)

-independence [Indyk SODA'99]. P\v{a}tra\c{s}cu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields

\tilde O(1/n^{1/c})

-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.Comment: To appear in Proceedings of SWAT 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Copenhagen University Research Information System

One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers

Author: G Cormode
G Cormode
G Zipf
Graham Cormode
M Cafaro
M Charikar
M Thorup
Mihai Pǎtraşcu
S Das
S Muthukrishnan
Publication venue
Publication date: 02/03/2019
Field of study

Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which significantly increased the performance of frequency estimation.Comment: 12 pages, 4 figures, 3 algorithms, 1 table, submitted to EuroPar'1

arXiv.org e-Print Archive

Crossref

Sabanci University Research Database

Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

Author: A. Siegel
H. Karloff
J.L. Carter
J.P. Schmidt
M. Dietzfelbinger
M. Pǎtraşcu
R. Motwani
T. Christiani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be

k

-independent if every variable is uniform and every size

k

subset is independent. In this paper we consider three classic algorithms under limited independence. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows. -Randomized quicksort. When pivot elements are computed using a

5

-independent hash function, Karloff and Raghavan, J.ACM'93 showed

O ( n \log n)

expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only

4

-independence. -Min-wise hashing. For a set

A

, consider the probability of a particular element being mapped to the smallest hash value. It is known that

5

-independence implies the optimal probability

O (1 /n)

. Broder et al., STOC'98 showed that

2

-independence implies it is

O(1 / \sqrt{|A|})

. We show a matching lower bound as well as new tight bounds for

3

- and

4

-independent hash functions. -Largest bucket. We consider the case where

n

balls are distributed to

n

buckets using a

k

-independent hash function and analyze the largest bucket size. Alon et. al, STOC'97 showed that there exists a

2

-independent hash function implying a bucket of size

\Omega ( n^{1/2})

. We generalize the bound, providing a

k

-independent family of functions that imply size

\Omega ( n^{1/k})

.Comment: Submitted to ICALP 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

Picture-Hanging Puzzles

Author: D. Rolfsen
E.D. Demaine
Ed Pegg Jr.
Erik D. Demaine
G.S. Makanin
H. Brunn
Joseph S. B. Mitchell
K. Ellul
M. Ajtai
M.S. Paterson
Martin L. Demaine
Mihai Pǎtraşcu
N.J.A. Sloane
P.G. Tait
Ronald L. Rivest
S. Theodore
T. Sillke
Yair N. Minsky
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/08/2013
Field of study

We show how to hang a picture by wrapping rope around n nails, making a polynomial number of twists, such that the picture falls whenever any k out of the n nails get removed, and the picture remains hanging when fewer than k nails get removed. This construction makes for some fun mathematical magic performances. More generally, we characterize the possible Boolean functions characterizing when the picture falls in terms of which nails get removed as all monotone Boolean functions. This construction requires an exponential number of twists in the worst case, but exponential complexity is almost always necessary for general functions.Comment: 18 pages, 8 figures, 11 puzzles. Journal version of FUN 2012 pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

DSpace@MIT

Triangle Counting in Dynamic Graph Streams

Author: A Pagh
A Pavan
CE Tsourakakis
I Kremer
JW Berry
Konstantin Kutzkov
L Becchetti
Laurent Bulteau
LJ Carter
M Pǎtraşcu
M Thorup
MN Kolountzakis
N Alon
R Albert
R Pagh
Rasmus Pagh
S Muthukrishnan
Vincent Froese
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/07/2015
Field of study

Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered {\em insert-only} streams. We present a new algorithm estimating the number of triangles in {\em dynamic} graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.Comment: New version of a SWAT 2014 paper with improved result

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Dynamic Compressed Strings with Random Access

Author: A. Brodnik
G. Manzini
J. Barbay
J. Jansson
M. Dietzfelbinger
M. Pǎtraşcu
P. Ferragina
P. Ferragina
R. González
R. Grossi
R. Grossi
R. Pagh
R. Pagh
R. Raman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

We consider the problem of storing a string S in dynamic compressed form, while permitting operations directly on the compressed representation of S: access a substring of S; replace, insert or delete a symbol in S; count how many occurrences of a given symbol appear in any given prefix of S (called rank operation) and locate the position of the ith occurrence of a symbol inside S (called select operation). We discuss the time complexity of several combinations of these operations along with the entropy space bounds of the corresponding compressed indexes. In this way, we extend or improve the bounds of previous work by Ferragina and Venturini [TCS, 2007], Jansson et al. [ICALP, 2012], and Nekrich and Navarro [SODA, 2013]

Crossref

Archivio della Ricerca - Università di Pisa

Leicester Research Archive

Yes, There is an Oblivious RAM Lower Bound!

Author: AC-C Yao
B Pinkas
C Gentry
Elette Boyle
Eyal Kushilevitz
I Damgård
K-M Chung
M Pǎtraşcu
Michael T. Goodrich
MT Goodrich
O Goldreich
S Devadas
S Lu
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 29/05/2018
Field of study

An Oblivious RAM (ORAM) introduced by Goldreich and Ostrovsky [JACM\u2796] is a (possibly randomized) RAM, for which the memory access pattern reveals no information about the operations performed. The main performance metric of an ORAM is the bandwidth overhead, i.e., the multiplicative factor extra memory blocks that must be accessed to hide the operation sequence. In their seminal paper introducing the ORAM, Goldreich and Ostrovsky proved an amortized

\Omega(\lg n)

bandwidth overhead lower bound for ORAMs with memory size

n

. Their lower bound is very strong in the sense that it applies to the ``offline\u27\u27 setting in which the ORAM knows the entire sequence of operations ahead of time. However, as pointed out by Boyle and Naor [ITCS\u2716] in the paper ``Is there an oblivious RAM lower bound?\u27\u27, there are two caveats with the lower bound of Goldreich and Ostrovsky: (1) it only applies to ``balls in bins\u27\u27 algorithms, i.e., algorithms where the ORAM may only shuffle blocks around and not apply any sophisticated encoding of the data, and (2), it only applies to statistically secure constructions. Boyle and Naor showed that removing the ``balls in bins\u27\u27 assumption would result in super linear lower bounds for sorting circuits, a long standing open problem in circuit complexity. As a way to circumventing this barrier, they also proposed a notion of an ``online\u27\u27 ORAM, which is an ORAM that remains secure even if the operations arrive in an online manner. They argued that most known ORAM constructions work in the online setting as well. Our contribution is an

\Omega(\lg n)

lower bound on the bandwidth overhead of any online ORAM, even if we require only computational security and allow arbitrary representations of data, thus greatly strengthening the lower bound of Goldreich and Ostrovsky in the online setting. Our lower bound applies to ORAMs with memory size

n

and any word size

r \geq 1

. The bound therefore asymptotically matches the known upper bounds when

r = \Omega(\lg^2 n)

Crossref

Cryptology ePrint Archive

Nested Counters in Bit-Parallel String Matching

Author: G. Das
G. Myers
G. Navarro
G. Navarro
G.M. Landau
H. Hyyrö
J. Kuri
K. Fredriksson
K. Fredriksson
M. Pǎtraşcu
M.L. Fredman
R.A. Baeza-Yates
S. Grabowski
W. Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Lower Bounds for Multi-Server Oblivious RAMs

Author: AC-C Yao
B Chen
B Pinkas
C Gentry
E Boyle
G Persiano
I Damgård
I Kerenidis
K-M Chung
KG Larsen
M Pǎtraşcu
M Weiss
MT Goodrich
O Goldreich
S Faber
S Lu
S Lu
SD Gordon
T-HH Chan
T-HH Chan
Z Dvir
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/11/2020
Field of study

In this work, we consider the construction of oblivious RAMs (ORAM) in a setting with multiple servers and the adversary may corrupt a subset of the servers. We present an

\Omega(\log n)

overhead lower bound for any

k

-server ORAM that limits any PPT adversary to distinguishing advantage at most

1/4k

when only one server is corrupted. In other words, if one insists on negligible distinguishing advantage, then multi-server ORAMs cannot be faster than single-server ORAMs even with polynomially many servers of which only one unknown server is corrupted. Our results apply to ORAMs that may err with probability at most

1/128

as well as scenarios where the adversary corrupts larger subsets of servers. We also extend our lower bounds to other important data structures including oblivious stacks, queues, deques, priority queues and search trees

Crossref

Cryptology ePrint Archive

Lower Bounds for Encrypted Multi-Maps and Searchable Encryption in the Leakage Cell Probe Model

Author: A Boldyreva
A Boldyreva
A Hamlin
AC-C Yao
D Boneh
D Boneh
D Cash
D Cash
G Asharov
G Persiano
I Demertzis
KG Larsen
M Bellare
M Chase
M Dietzfelbinger
M Pǎtraşcu
O Goldreich
R Curtmola
S Garg
S Kamara
S Kamara
S Kamara
S Kamara
S Patel
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 12/10/2020
Field of study

Encrypted multi-maps (EMMs) enable clients to outsource the storage of a multi-map to a potentially untrusted server while maintaining the ability to perform operations in a privacy-preserving manner. EMMs are an important primitive as they are an integral building block for many practical applications such as searchable encryption and encrypted databases. In this work, we formally examine the tradeoffs between privacy and efficiency for EMMs. Currently, all known dynamic EMMs with constant overhead reveal if two operations are performed on the same key or not that we denote as the

\mathit{global\ key\text{-}equality\ pattern}

. In our main result, we present strong evidence that the leakage of the global key-equality pattern is inherent for any dynamic EMM construction with

O(1)

efficiency. In particular, we consider the slightly smaller leakage of

\mathit{decoupled\ key\text{-}equality\ pattern}

where leakage of key-equality between update and query operations is decoupled and the adversary only learns whether two operations of the

\mathit{same\ type}

are performed on the same key or not. We show that any EMM with at most decoupled key-equality pattern leakage incurs

\Omega(\log n)

overhead in the

\mathit{leakage\ cell\ probe\ model}

. This is tight as there exist ORAM-based constructions of EMMs with logarithmic slowdown that leak no more than the decoupled key-equality pattern (and actually, much less). Furthermore, we present stronger lower bounds that encrypted multi-maps leaking at most the decoupled key-equality pattern but are able to perform one of either the update or query operations in the plaintext still require

\Omega(\log n)

overhead. Finally, we extend our lower bounds to show that dynamic,

\mathit{response\text{-}hiding}

searchable encryption schemes must also incur

\Omega(\log n)

overhead even when one of either the document updates or searches may be performed in the plaintext

Crossref

Cryptology ePrint Archive